Machine Translation Through Clausal Syntax : A Statistical Approach for Chinese to English by Dan Lowe Wheeler
نویسندگان
چکیده
Language pairs such as Chinese and English with largely differing word order have proved to be one of the greatest challenges in statistical machine translation. One reason is that such techniques usually work with sentences as flat strings of words, rather than explicitly attempting to parse any sort of hierarchical structural representation. Because even simple syntactic differences between languages can quickly lead to a universe of idiosyncratic surfacelevel word reordering rules, many believe the near future of machine translation will lie heavily in syntactic modeling. The time to start may be now: advances in statistical parsing over the last decade have already started opening the door. Following the work of Cowan et al., I present a statistical tree-to-tree translation system for Chinese to English that formulates the translation step as a prediction of English clause structure from Chinese clause structure. Chinese sentences are segmented and parsed, split into clauses, and independently translated into English clauses using a discriminative featurebased model. Clausal arguments, such as subject and object, are translated separately using an off-the-shelf phrase-based translator. By explicitly modeling syntax at a clausal level, but using a phrase-based (flat-sentence) method on local, reduced expressions, such as clausal arguments, I aim to address the current weakness in long-distance word reordering while still leveraging the excellent local translations that today's state of the art has to offer. Thesis Supervisor: Michael Collins Title: Associate Professor of Computer Science
منابع مشابه
A New Subtree-Transfer Approach to Syntax-Based Reordering for Statistical Machine Translation
In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived ...
متن کاملDecoding Optimization for Chinese-English Machine Translation via a Dependent Syntax Language Model
Decoding is a core process of the statistical machine translation, and determines the final results of it. In this paper, a decoding optimization for Chinese-English SMT with a dependent syntax language model was proposed, in order to improve the performance of the decoder in Chinese-English statistical machine translation. The data set was firstly trained in a dependent language model, and the...
متن کاملBilingual Sentiment Consistency for Statistical Machine Translation
In this paper, we explore bilingual sentiment knowledge for statistical machine translation (SMT). We propose to explicitly model the consistency of sentiment between the source and target side with a lexicon-based approach. The experiments show that the proposed model significantly improves Chinese-to-English NIST translation over a competitive baseline.
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملDependency-based Pre-ordering for Chinese-English Machine Translation
In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders. This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT. We present a set of dependency-based preordering rules which improved the BLEU ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009